Icosahedral packing of RNA viral genomes 
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Recent studies reveal that certain viruses package a portion of their genome in a manner that 
mirrors the icosahedral symmetry of the protein container, or capsid. Graph theoretical constraints 
forbid exact realization of icosahedral symmetry. This paper proposes a model for the determination 
of quasi-icosahedral genome structures and discusses the connection between genomic structure and 
viral assembly kinetics. 
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An essential step in the assembly of any virus is the 
structural reorganization of the single stranded (ss) or 
double stranded (ds) RNA and DNA molecules that com- 
prise the genome into a form that fits inside the viral cap- 
sid. This involves a well-known structural incongruity; 
the protein capsid shells of nearly all sphere-like viruses 
adopt icosahedral symmetry while, as discussed below, a 
viral genome of less than twelve segments fundamentally 
cannot assume an icosahedral conformation in view of 
its one-dimensional, chain-like primary structure 0. In 
fact, microscopy studies of the structure of the ds DNA 
genome of bacteriophage viruses reveal a completely non- 
icosahedral spool-like organization 0. A semi- flexible 
chain — such as ds DNA — that is confined inside a sphere 
having an interior surface that repels this chain and a 
radius comparable or smaller than the chain persistence 
length will, indeed, adopt such a spool-like structure as 
its free energy minimum 0, • 

The problem of the interior organization of ss RNA 
viral genomes is more delicate. SS nucleotide chains 
have a much greater conformational flexibility than du- 
plex chains, so ss RNA chains are better able to adjust 
to the icosahedral symmetry of the capsid than ds DNA 
chains. The secondary structure of RNA molecules in so- 
lution is characterized by paired segments conntected by 
branch points, unpaired "bubbles" and "hair pins" (see 
Fig. |2J). The various competing secondary structures 
can be analyzed by statistical mechanical methods p. 
Moreover, unlike the case of the bacteriophage viruses, 
for a significant number of ss RNA viruses, capsid as- 
sembly requires the presence of the genome molecules 
0. This "co-assembly," which can take place sponta- 
neously by self-assembly 0, is due in part to non-specific 
electrostatic attraction between RNA molecules and cap- 
sid proteins. Attractive interactions between the genome 
and the interior surface of the icosahedral capsid should 
promote icosahedral order. X-ray diffraction studies re- 
veal that the outer layer of the genome of a number of 
ss RNA viruses such as Cowpea Chlorotic Mottle Virus 
(CCMV) 000, Flock House Virus (FHV) 0,0,0, cali- 



civirus 0,0, 03, Cowpea Mosaic Virus (CM^pQ [[J , 
Turnip Yellow Mosaic Virus (TYMV) 0, ElfT indeed 
adopt at least partial icosahedral symmetry. 

A particularly striking illustration of icosahedral order- 
ing is provided by the Nodaviridae group of viruses. The 
FHV and Pariacoto Nodaviridae viruses have so-called 
"T=3" icosahedral capsids, as shown in Fig. [I] The 
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FIG. 1: Image of the pariocoto virus, reconstructed from cryo- 
elctron micrographs by Tang, et. al. 



genome of these viruses consists of two single-stranded 
RNA molecules. One of these two molecules, RNA2, con- 
sists of about 1,400 (FHV), respectively 1,300 (Pariacoto) 
base pairs and encodes for "Protein a," a precursor of the 
capsid protein. This molecule plays a key stabilizing role 
in capsid assembly p , and it is probably the part of the 
genome that is resolved in the diffraction experiments. 
As shown in Figs. ^ an d c, the icosahedral portion of 
the RNA genome is distributed over a dodecahedral cage 
formed by the low-curvature borders of twelve five-fold 
pyramids that together make up the T=3 rhombic tri- 
contahedron capsids of the FHV and Pariacoto viruses. 
Even though the genome is ss RNA, the portions shown 
in Fig. [I] actually have the form of double helices with 
the two strands oriented in the standard way with op- 
posite 3' to 5' directions. These paired double-helical 
segments have a length per edge of about 10 bases for 
the FHV structural study and 25 bases for the Paria- 
coto structural study. Given the 30 edges of a dodeca- 
hedron, this means that for FHV about 43% of RNA2 
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contributes to the double- helical sections, and for Pari- 
ocoto essentially 100% of RNA2. If we assume that the 
20 (unresolved) vertices are also structurally identical, we 
would have to connect together the 30 double-helical seg- 
ments with 20 identical three-strand branch-points (see 
Fig. EK) placed at the vertices of the dodecahedron. 
The resulting structure has true icosahedral symmetry 
(apart from the different base-pair sequences for the dif- 
ferent segments), but it actually consists of 12 separate, 
interlocking RNA rings, which is inconsistent with the 
structure of the Nodaviridae genome 14]. This struc- 
tural conflict is closely connected to a classical result of 
graph theory, which states that it is impossible to con- 
struct a one-dimensional path restricted to the edges of 
either an icosahedral or a dodecahedral structure while 
visiting every edge just once pjj|l6[ (a dodecahedron has 
the same symmetry properties as an icosahedron). The 
proposition that the 20 vertices obey icosahedral sym- 
metry is thus incorrect. As shown below, information 
concerning the structural inhomogeneity of the vertices 
actually provides us with important information on the 
formation history of the virus. It is the aim of the present 
article to present a statistical mechanical model for quasi- 
icosahedral ordering of ss RNA viral genomes. 

On the basis of the above-mentioned X-ray structure 
studies, we set the following constraints on the secondary 
structure of RNA2: 

(i) the edges of the dodecahedron should be occupied 
by the rigid duplex segments. 

(ii) the branch points, bubbles, and hair-pins of the sec- 
ondary structure should be confined to the vertex 
regions. 

Under these constraints, there are only three possible 
types of vertex structures, which can be represented dia- 
grammatically as follows. The first vertex type ("A") are 
pure branch-points, the familiar three-strand junctions of 
ds RNA and DNA. These vertex types are illustrated in 
Fig. The arrows in the figure indicate the 3' to 5' 
direction of the RNA2 molecule. Electrostatic repulsion 
between the duplex branches of the junction favors 120° 
angles between the branches, which have to be deformed 
by a moderate amount in order to fit on the vertex of 
a dodecahedron. Cases Al (a "right-turn" vertex) and 
A2 (a "left-turn" vertex) are related by a rotation over 
180 degrees along one of their branches. The energy cost, 
AEa, is the same for the two cases. 

Next, a type "B" vertex consists of a combination bub- 
ble/hairpin, as shown in Fig. 03, which has a total of six 
variations. Note that the presence of the bubble permits 
the sharp kink between the two connected duplex sec- 
tions required at a vertex of the dodecahedron. There 
are now four RNA strands located at the vertex so the 
electrostatic energy, AEb, of a class B vertex is presum- 
ably higher than that of a class A vertex. Finally, for a 
class C vertex, three hair-pin/stem- loops are located at 
one vertex A class C vertex requires that six RNA strands 
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FIG. 2: The three types of vertices. In the case of the "A" 
type vertices (branch points) both a right-turn (Al) and a 
left-turn (A2) vertex are shown. A type B vertex combines 
a "bubble" and a "hairpin," while a type C vertex contains 
only hairpins. 

are located at a single vertex and thus has presumably 
the highest electrostatic energy (AE C ). 

The energy cost of a particular secondary RNA2 struc- 
ture is, then, 

H = E so1 + N A AE A + N B AE B + N C AE C (1) 

with E so1 the solution energy cost of the secondary struc- 
ture computed by the existing procedure p , while Na-c 
denotes the number of vertices. The non-specific electro- 
static binding energy contributes a constant energy that 
need not be included. 

Since, by assumption, Class A vertices have the low- 
est free energy it is logical to start by constructing 
RNA configurations that have only A- type junctions, in 
which case there are no hairpins or stem-loops at all (i.e. 
Nb = Nc = 0). That does not mean that the genome 
has to be disconnected. Recall that there are two differ- 
ent possible junctions, related by a 180° rotation. Since 
we allow for vertex heterogeneity, we can use both junc- 
tion types and search for a single-connected genome. In 
order to do this, we represented the dodecahedral cage by 
a planar graph. A singly-connected RNA molecule must 
visit every edge of this graph twice, each case in opposite 
directions, in order to form the double helical segments 
on the edges. The existence of this type of path is not 
forbidden by graph theory. The problem of identifying 
a permissible path for a ss RNA molecule is equivalent 
to a finding particular choice for the Al and A2 ver- 
tices at every node of the dodecahedron such that they 
can be connected by a continuous path. We will call 
this a Modified Euler Path (MEP). In order to identify 
MEP's, we carried out a computer search in which one 
of the nodes of the dodecahedron was singled out as the 
starting (and ending) point of the MEP, while all other 
vertices are either designated as right turn or left turn 
vertices. This produced 262,144 different candidates for 
a MEP. Most of the candidate walks were unsuccessful, 
because the walker arrived at a point at which the con- 
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ditions for the walk were violated. However, in 54,272 of 
the cases a MEP was generated with a mean success rate 
of 53/256=0.207. . . Two examples of such walks are dis- 
played in Fig. 03 Although the (3D) symmetry of these 




FIG. 3: Two examples of modified Euler paths covering the 
edges of a dodecahedron. The vertex at which each path 
begins and ends is in the upper central portion of the dodec- 
ahedral graph. The path at the right contains the minimum 
number of crossings, while the path at the right contains the 
maximum number of crossings. 

RNA structures is very close to icosahedral, it is not exact 
because the successful MEPs contain a certain number of 
edges with an extra crossing, as indicated by a cross in 
Fig. 03 This implies that the double-helical segments 
that occupy these must have either an extra half turn, or 
a deficit of one half turn. The MEPs we generated con- 
tained a minimum of eleven crossings and a maximum 
of twenty, corresponding to the two cases shown in Fig. 
03 If the energy cost of Type B and Type C vertices are 
significantly higher than that of a Type A vertex, then 
the pattern with the minimum number of extra crossings 
should be the lowest free energy structure. 

However, it is well known that a path covering a 
graph with overpasses and underpasses, as shown in 
Fig. 03 are knotted H3 Even the MEP that has the 
minimum number of crossings still is highly knotted. 
RNA molecules in solution are never knotted (though 
viral RNA may have "pseudo-knots" and a knot- 

ted RNA molecule released by a virus during infection 
would not be functional. We thus must exclude knot- 
ted RNA structure and hence Class A Quasi-Icosahedral 
order. Knot-free genomes can be constructed by de- 
manding that, unlike the structures shown in Fig. 03 
the secondary structure of RNA molecules inside a virus 
has the same linearly-branched, circuit-free topology as 
RNA molecules in solution. When a linearly branched 
structure decorates a dodecahedral graph, types B and 
C vertices unavoidably appear. Given the presumed 
higher free energy cost of a type C vertex, we will con- 
struct linearly branched genomes with only type A and 
B vertices. Linearly-branched RNA secondary structures 
characteristically contains an equal number of branches 
and hairpin/stemloop structures. That means that there 
must be an equal number of A and B type vertices, so 
Na — Nb- The simplest example of such an A-B struc- 
ture involves singly-branched RNA molecules, i.e. the 
molecule consists of a main- chain and a certain num- 
ber of unbranched side- chains. Secondary structures that 



are nearly singly-branched are indeed encountered in the 
spectrum of RNA2, with an energy of about 10 kcal/mole 
above the groundstate [T^ . 

To cover the dodecahedral graph with a single- 
branched structure, every edge of the dodecahedron must 
be a link between an A type and a B type vertex. If this 
is the case, then the main chain of the RNA molecule 
must visit every vertex of the dodecahedral graph once 
and only once. The construction of such a route is a well- 
known problem in graph theory, known as a "Hamilto- 
nian Path" pjj[l£j. We will focus on the special case of a 
Hamiltonian Path that starts and ends on the same ver- 
tex of a graph, which is a Hamiltonian Cycle, is shown 
in the lower right-hand corner of Fig. 0J In this case, 
any point on the graph can be treated as the starting 
site. To every edge of the graph that is not part of the 
Hamiltonian Path, we must assign an A-B or a B-A pair 
of vertices, so for the Hamiltonian Cycle of relevance to 
the ordering of RNA on an icosahedral virus, we obtain 
2 10 different possible configurations. These 2 10 config- 
urations form a subset of the allowed secondary RNA 
stuctures in solution. For a given base-pair sequence, the 
optimal secondary structure within this subset can, in 
principle, be determined by minimizing H. 

The biological relevance of the secondary structure of 
the viral genome is, in fact, intimately connected with 
the assembly scenario of the virus, and this imposes 
important constraints on the RNA configuration. It 
has been well-established for ss RNA cylindrical viruses 
that the RNA molecule acts as a linear growth tem- 
plate Hj. Growth starts from a specific hairpin struc- 
ture (the "packaging signal") and proceeds by succes- 
sive addition of protein oligomers under the action of the 
non-specific protein-RNA interaction. The length of the 
RNA molecule determines the size of the virus. It is 
currently not known whether main chain type ss-RNA 
molecules also can act as growth templates for the icosa- 
hedral family of viruses, but the only microscopy study 
of the growth kinetics of RNA/capsid co- assembly of an 
icosahedral virus does report that growth starts from a 
three-protein nucleus followed by the successive addition 
of oligomer units [2]]. The observed co-assembly sce- 
nario resmbles the growth of a curved two-dimensional 
(2D) protein crystal that closes in on itself. The charac- 
teristic feature of conventional (slowly) growing crystals 
is that they are compact. New units are predominantly 
added to those available sites on the surface of the crys- 
tal where the new unit can make a maximum number 
of bonding contacts with units that are already incorpo- 
rated. The co-assembly condition requires the presence 
of RNA material when new units are added. The effect 
of this process is to even out surface roughness, although 
with increased growth velocities, the growth surface may 
roughen. 

To examine whether the reported growth scenario is 
consistent with the proposed Hamiltonian path/cycle 
model developed above, we assume that the Hamitonian 
Path construction provides the lowest accessible free en- 
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ergy structure as proposed in the previous section. The 
assembly history for the Hamiltonian Cycle is shown in 
Fig. 2J in which we show only the main chain part. Edges 
that are not indicated by heavy lines in the figure below 
are occupied by a side-chain, ending in a hairpin. Ex- 
cept for the first step, every added pentamer bonds at 
least to two edges, one occupied by the main chain and 
one occupied by a side chain. The growth morphology 
imposed by the Hamiltonian path construction is, then, 
consistent with compact 2D growth structures. 




FIG. 4: Assembly of a viral capsid governed by coordi- 
nated with the tracing out of a Hamiltonian path by the 
icosahedrally-ordered RNA. For the fully- assembled capsid, 
shown in the lower right-hand corner of the figure, all por- 
tions of the Hamiltonian path traced out by the RNA main 
chain are indicated. 



In summary, on the basis of elementary graph- 
theoretical arguments, we propose that the quasi- 
icosahedral organization of unsegmented ss RNA viral 
genomes is based on the exploitation of the secondary 
structure of the viral RNA to form a Hamiltonian path or 
cycle along a subset of the edges of a polyhedron inscribed 
on the capsid surface, with side-branches covering the re- 
maining edges. The important role for the RNA main- 
chain structure would be consistent with the main-chain 
acting as a linear template for the growth of the capsid 
during the self-assembly process. Our arguments predict 
that quasi-icosahedral organization should never be en- 
countered for viruses with unsegmented fully duplexed 
ds DNA or RNA genomes. Experimental tests should 
be straightforward. For instance, FHV self-assembles 
under in-vitro conditions. Self-assembly of FHV with 
a duplexed RNA strand having the same length as the 
FHV ss RNA should be difficult or impossible. Deter- 
mining the nature fo the vertex inhomogeneity of actual 
Nodaviridae would provide a decisive test of the model. 
Because current structural determination methods im- 
pose icosahedral symmetry, this is not yet possible, but 
we are hopeful that developments in these methods will 
make such a test feasible in the not-too-distant future. 
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